In Proceedings of LREC-2002 Workshop Data Collection and Language Technologies for Mapudungun
نویسندگان
چکیده
Mapudungun is spoken by over 900,000 people (Mapuche) in Chile and Argentina. Thanks to an active bilingual and multicultural education program, Mapuche children are now being taught to be literate in both Mapudungun and Spanish. The Chilean Ministry of Education has teamed up with the Language Technologies Institute’s AVENUE project to collect data and produce language technologies that support bilingual education. The main resource that has come out of the Mineduc-LTI partnership is Mapudungun-Spanish parallel corpus consisting of approximately 200,000 words of text and 120 hours of transcribed speech. Plans are being made for machine translation and computer-assisted instruction.
منابع مشابه
Data Collection and Language Technologies for Mapudungun
Mapudungun is spoken by over 900,000 people (Mapuche) in Chile and Argentina. Thanks to an active bilingual and multicultural education program, Mapuche children are now being taught to be literate in both Mapudungun and Spanish. The Chilean Ministry of Education has teamed up with the Language Technologies Institute’s AVENUE project to collect data and produce language technologies that suppor...
متن کاملData Collection and Analysis of Mapudungun Morphology for Spelling Correction
This paper describes part of a three year collaboration between Carnegie Mellon University's Language Technologies Institute, the Programa de Educación Intercultural Bilingüe of the Chilean Ministry of Education, and Universidad de La Frontera (Temuco, Chile). We are currently constructing a spelling checker for Mapudungun, a polysynthetic language spoken by the Mapuche people in Chile and Arge...
متن کاملMessage from the Program Chair
The Third NTCIR Workshop is the third venture in a series of evaluation workshops designed to enhance research in information access technologies including text retrieval, cross language information retrieval, automatic text summarization, information extraction, and question answering on Japanese and Asian language text. The goals of the NTCIR Workshops are as follows: 1. to encourage research...
متن کاملBuilding NLP Systems for Two Resource-Scarce Indigenous Languages: Mapudungun and Quechua
By adopting a “first-things-first” approach we overcome a number of challenges inherent in developing NLP Systems for resourcescarce languages. By first gathering the necessary corpora and lexicons we are then enabled to build, for Mapudungun, a spellingcorrector, morphological analyzer, and two Mapudungun-Spanish machine translation systems; and for Quechua, a morphological analyzer as well as...
متن کاملTowards an International Standard on Feature Structure Representation
This paper describes the preliminary results of a joint initiative of the TEI (Text Encoding Initiative) Consortium and the ISO Committee TC 37SC 4 (Language Resource management) to provide a standard for the representation and interchange of feature structures. The paper published in the proceedings of this workshop is in fact an extension of a paper published in the LREC 2004 proceedings, and...
متن کامل